A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four Nucleotide base: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from attached to nucleotides.
The first automated DNA sequencer, invented by Lloyd M. Smith, was introduced by Applied Biosystems in 1987. It used the Sanger sequencing method, a technology which formed the basis of the "first generation" of DNA sequencers and enabled the completion of the Human Genome Project in 2001. This first generation of DNA sequencers are essentially automated electrophoresis systems that detect the migration of labelled DNA fragments. Therefore, these sequencers can also be used in the genotyping of genetic markers where only the length of a DNA fragment(s) needs to be determined (e.g. , AFLPs).
The Human Genome Project spurred the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS) to sequence the human genome. These include the 454, SOLiD and Illumina DNA sequencing platforms. Next generation sequencing machines have increased the rate of DNA sequencing substantially, as compared with the previous Sanger methods. DNA samples can be prepared automatically in as little as 90 mins, while a human genome can be sequenced at 15 times coverage in a matter of days.
More recent, third-generation DNA sequencers such as PacBio SMRT and Oxford Nanopore offer the possibility of sequencing long molecules, compared to short-read technologies such as Illumina SBS or MGI Tech's DNBSEQ.
Because of limitations in DNA sequencer technology, the reads of many of these technologies are short, compared to the length of a genome therefore the reads must be assembled into longer . The data may also contain errors, caused by limitations in the DNA sequencing technique or by errors during PCR amplification. DNA sequencer manufacturers use a number of different methods to detect which DNA bases are present. The specific protocols applied in different sequencing platforms have an impact in the final data that is generated. Therefore, comparing data quality and cost across different technologies can be a daunting task. Each manufacturer provides their own ways to inform sequencing errors and scores. However, errors and scores between different platforms cannot always be compared directly. Since these systems rely on different DNA sequencing approaches, choosing the best DNA sequencer and method will typically depend on the experiment objectives and available budget.
More recently, a third generation of DNA sequencers was introduced. The sequencing methods applied by these sequencers do not require DNA amplification (polymerase chain reaction – PCR), which speeds up the sample preparation before sequencing and reduces errors. In addition, sequencing data is collected from the reactions caused by the addition of nucleotides in the complementary strand in real time. Two companies introduced different approaches in their third-generation sequencers. Pacific Biosciences sequencers utilize a method called Single-molecule real-time (SMRT), where sequencing data is produced by light (captured by a camera) emitted when a nucleotide is added to the complementary strand by enzymes containing fluorescent dyes. Oxford Nanopore Technologies is another company developing third-generation sequencers using electronic systems based on nanopore sensing technologies.
Roche currently manufactures two systems based on their pyrosequencing technology: the GS FLX+ and the GS Junior System. The GS FLX+ System promises read lengths of approximately 1000 base pairs while the GS Junior System promises 400 base pair reads. A predecessor to GS FLX+, the 454 GS FLX Titanium system was released in 2008, achieving an output of 0.7G of data per run, with 99.9% accuracy after quality filter, and a read length of up to 700bp. In 2009, Roche launched the GS Junior, a bench top version of the 454 sequencer with read length up to 400bp, and simplified library preparation and data processing.
One of the advantages of 454 systems is their running speed. Manpower can be reduced with automation of library preparation and semi-automation of emulsion PCR. A disadvantage of the 454 system is that it is prone to errors when estimating the number of bases in a long string of identical nucleotides. This is referred to as a homopolymer error and occurs when there are 6 or more identical bases in row. Another disadvantage is that the price of reagents is relatively more expensive compared with other next-generation sequencers.
In 2013 Roche announced that they would be shutting down development of 454 technology and phasing out 454 machines completely in 2016 when its technology became noncompetitive.
Roche produces a number of software tools which are optimised for the analysis of 454 sequencing data. Such as,
The technology leading to these DNA sequencers was first released by Solexa in 2006 as the Genome Analyzer. Illumina purchased Solexa in 2007. The Genome Analyzer uses a sequencing by synthesis method. The first model produced 1G per run. During the year 2009 the output was increased from 20G per run in August to 50G per run in December. In 2010 Illumina released the HiSeq 2000 with an output of 200 and then 600G per run which would take 8 days. At its release the HiSeq 2000 provided one of the cheapest sequencing platforms at $0.02 per million bases as costed by the Beijing Genomics Institute.
In 2011 Illumina released a benchtop sequencer called the MiSeq. At its release the MiSeq could generate 1.5G per run with paired end 150bp reads. A sequencing run can be performed in 10 hours when using automated DNA sample preparation.
The Illumina HiSeq uses two software tools to calculate the number and position of DNA clusters to assess the sequencing quality: the HiSeq control system and the real-time analyzer. These methods help to assess if nearby clusters are interfering with each other.
SOLiD systems was acquired by Applied Biosystems in 2006. SOLiD applies sequencing by ligation and dual base encoding. The first SOLiD system was launched in 2007, generating reading lengths of 35bp and 3G data per run. After five upgrades, the 5500xl sequencing system was released in 2010, considerably increasing read length to 85bp, improving accuracy up to 99.99% and producing 30G per 7-day run.
The limited read length of the SOLiD has remained a significant shortcoming and has to some extent limited its use to experiments where read length is less vital such as resequencing and transcriptome analysis and more recently ChIP-Seq and methylation experiments. The DNA sample preparation time for SOLiD systems has become much quicker with the automation of sequencing library preparations such as the Tecan system.
The colour space data produced by the SOLiD platform can be decoded into DNA bases for further analysis, however software that considers the original colour space information can give more accurate results. Life Technologies has released BioScope, a data analysis package for resequencing, ChiP-Seq and transcriptome analysis. It uses the MaxMapper algorithm to map the colour space reads.
+ Comparing metrics and performance of next-generation DNA sequencers. | |||||||
Manufacturer | Ion Torrent (Life Technologies) | 454 Life Sciences (Roche) | Illumina | Applied Biosystems (Life Technologies) | Pacific Biosciences | Applied Biosystems (Life Technologies) | MGI |
Sequencing Chemistry | Ion semiconductor sequencing | Pyrosequencing | Polymerase-based sequence-by-synthesis | Ligation-based sequencing | Phospholinked fluorescent nucleotides | Dideoxy chain termination | Polymerase-based sequence-by-synthesis |
Amplification approach | Emulsion PCR | Emulsion PCR | Bridge amplification | Emulsion PCR | Single-molecule; no amplification | PCR | DNA nanoball (DNB) generation |
Data output per run | 100-200 Mb | 0.7 Gb | 600 Gb | 120 Gb | 0.5 - 1.0 Gb | 1.9~84 Kb | 1440 Gb / 1500-1800M reads |
Accuracy | 99% | 99.9% | 99.9% | 99.94% | 88.0% (>99.9999% CCS or HGAP) | 99.999% | 99.90% |
Time per run | 2 hours | 24 hours | 3–10 days | 7–14 days | 2–4 hours | 20 minutes - 3 hours | 3–5 days |
Read length | 200-400 bp | 700 bp | 100x100 bp paired end | 50x50 bp paired end | 14,000 bp (N50) | 400-900 bp | 100/150/200 bp paired end |
Cost per run | US$350 | US$7,000 | US$6,000 (30x human genome) | US$4,000 | $125–300 USD | US$4 (single read/reaction) | N/A |
Cost per Mb | US$1.00 | US$10 | US$0.07 | US$0.13 | $0.13 - US$0.60 | US$2400 | $0.007 |
Cost per instrument | US$80,000 | US$500,000 | US$690,000 | US$495,000 | US$695,000 | US$95,000 | N/A |
|
|